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We develop methodology for the estimation of the functional 
mean and the functional principal components when the functions 
form a spatial process. The data consist of curves X(sk;t),t 6 [0, T], 
observed at spatial locations si , S2 , . . . , sjv . We propose several meth- 
ods, and evaluate them by means of a simulation study. Next, we 
develop a significance test for the correlation of two such functional 
spatial fields. After validating the finite sample performance of this 
test by means of a simulation study, we apply it to determine if 
there is correlation between long-term trends in the so-called crit- 
ical ionospheric frequency and decadal changes in the direction of 
the internal magnetic field of the Earth. The test provides conclusive 
evidence for correlation, thus solving a long-standing space physics 
conjecture. This conclusion is not apparent if the spatial dependence 
of the curves is neglected. 

1. Introduction. The contribution of this paper to statistics is two-fold: 
(1) we develop estimation methodology for the functional mean and the 
functional principal components (FPCs) when the functions form a spatial 
field; (2) we propose a significance test to determine if two families of curves 
observed at the same spatial locations are uncorrelated. The contribution 
to space physics consists in solving a controversy regarding the impact of 
long-term changes in the internal magnetic field of the Earth on long-term 
ionospheric trends. The required physics background is provided later in this 
section, and in Section 8. 

The data is modeled as curves X(sk\ t),t £ [0, T], observed at spatial loca- 
tions si,S2, • ■ • , sj\r- Such functional data structures are quite common, but 
typically the spatial dependence and the spatial distribution of the points 
are not taken into account. A fundamental question is how to estimate the 
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mean function of curves indexed by spatial locations. Clearly, curves lo- 
cated at close-by points look similar and must be given smaller weights than 
curves at points far apart. In addition to the mean function, FPCs play 
a fundamental role in functional data analysis. Good estimators of FPCs 
are needed to construct reliable testing and classification procedures, but 
such issues have been addressed only in the contexts of independent curves, 
with focus on sparsity and measurement error. The geophysical data that 
motivate this research are available at fine temporal grids and are measured 
with errors that are negligible relative to the objectives of the statistical 
analysis. A focus of recent geophysical research is on the detection and es- 
timation of global and/or regional long-term trends (the global warming 
paradigm), so before a statistical analysis is undertaken, the data are typ- 
ically smoothed to remove daily or even annual periodicity. The question 
we address is how to combine the temporal trajectories available at many 
spatial locations to obtain meaningful summary trends. We argue that one 
can do better than using simple averaging. The focus of this paper is thus on 
combining information from spatially dependent curves, which are smooth 
and available at all time points. 

Many environmental and geophysical data sets fall into the framework 
considered in this paper. The data set that motivated this research consists 
of the curves of the ionospheric F2-layer critical frequency, foF2. Three such 
curves are shown in Figure 1. In principle, foF2 curves are available at over 
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Fig. 1. F2-layer critical frequency curves at three locations. Top to bottom (latitude in 
parentheses): Yakutsk (62.0), Yamagawa (31.2), Manila (14-7). The functions exhibit a 
latitudinal trend in amplitude. 
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Fig. 2. Locations of 218 ionosonde stations. Circles represent the 32 stations with the 
longest complete records. 

200 locations throughout the globe (see Figure 2), but sufficiently complete 
data are available at only 30-40 locations which are very unevenly spread; for 
example, there is a dense network of observatories over Europe and practi- 
cally no data over the oceans. The study of this data set has been motivated 
by the hypothesis of Roble and Dickinson (1989), who suggested that the 
increasing amounts of (radiative) greenhouse gases should lead to global 
cooling in mesosphere and thermosphere, as opposed to the global warming 
in lower troposphere; cf. Figure 3. Rishbeth (1990) pointed out that such 
cooling would result in a thermal contraction and the global lowering of 
the ionospheric peak densities, which can be computed from the critical fre- 
quency foF2. The last twenty years have seen very extensive research in this 
area; see Lastovicka et al. (2008) for a partial overview. One of the difficul- 
ties in determining a global trend is that the foF2 curves appear to exhibit 
trends in opposing directions over various regions. A possible explanation 
suggests that these trends are caused by long-term trends in the magnetic 
field of the Earth. There is, however, currently not agreement in the space 
physics community if this is indeed the case. In general, to make any trends 
believable, a suitable statistical modeling and a proper treatment of "errors 
and uncertainties" is called for [Ulich, Clilverd and Rishbeth (2003)]. This 
paper makes a contribution in this direction. Space physics data measured 
at terrestrial observatories always come in the form of temporal curves at 
fixed spatial locations. In Maslova et al. (2009), Maslova et al. (2010a) and 
Maslova et al. (2010b) the tools of functional data analysis were used to 
study such data, but the spatial dependence of the curves was not fully 
exploited. 
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Fig. 3. Typical profile of day time ionosphere. The curve shows electron density as a 
function of height. The right vertical axis indicates the D, E and F regions. 

Spatio-temporal modeling has received a great deal of attention of late; see 
Part V of Gelfand et al. (2010) and Chapters 3, 4 and 6 of Finkenstaedt, Held 
and Isham (2007) which discuss spatio-temporal models for geostatistical 
data. There has, however, not been much research specifically on spatially 
indexed functional data; Delicado et al. (2010) review recent contributions. 
For geostatistical functional data, several approaches to kriging have been 
proposed; see Yamanishi and Tanaka (2003), Nerini, Monestiez and Mante 
(2010), Giraldo, Delicado and Mateu (2011a) and Bel et al. (2011). 

Throughout the paper, {X(s)} denotes a random field defined on a spatial 
domain and taking values in the Hilbert space L 2 = L 2 ([0,1]) with the inner 
product 

(/,<?>= / f(t)g(t)dt, f,g£L 2 . 
Jo 

The value of the function X(s) £ L 2 at time t £ [0, 1] is denoted by X(s;t). 
We postulate the model 

oo 

(1.1) X(s;t)=fi(t) + ^ i (s)e l (t), f l (s) = (X(s)- f i,e i ), 

i=i 

where the ej form a complete orthonormal system. Note that the mean 
function \i and the FPCs do not depend on s. A sufficient condition for 
this is that the distribution in L 2 of the function X(s) does not depend on 
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the location s. A stronger sufficient condition is the strict stationarity of the 
field {X(s)}. 

For the applications we have in mind, it is enough to assume that the 
spatial domain is a subset of the plane or a two-dimensional sphere. On the 
plane, the distance between points is the usual Euclidean distance; on the 
sphere, we use the chordal distance defined as the Euclidean distance in the 
three-dimensional space. The reason for using the chordal distance is that 
any spatial covariance functions in M 3 restricted to the unit sphere is then 
also a covariance function on the sphere. Denoting the latitude by L and the 
longitude by I, the chordal distance, < d^i < 2, between two points, s/%,S£, 
on the unit sphere is given by 



(1.2) d k , 



sin 2 I — ^— — - I + cos Lfc cos Li sin 



n i/2 



For arbitrary (not necessarily spatially indexed) functions, X±,X2, ■ ■ ■ ,Xn, 
the sample mean is defined as Xjy = N^ 1 J2n=i -^n, and the sample covari- 
ance operator as 

N 

C(x) = N- 1 ]T[((X n - X N ),x)(X n - X N )}, x G L 2 

n=l 



The sample FPCs are computed as the eigenfunctions of C. These are the 
estimates produced by several software packages, including the popular R 
package fda; see Ramsay, Hooker and Graves (2009). The consistency of 
the sample mean and the sample FPCs relies on the assumption that the 
functional observations form a simple random sample. If the functions = 
X(sk) are spatially distributed, the sample mean and the sample FPCs need 
not even be consistent; see Horman and Kokoszka (2012). This happens if 
the spatial dependence is strong or if there are clusters of the points s^. We 
will demonstrate that better estimators are available and we will use them as 
part of the procedure for testing the independence of two functional fields 
{X(s),s G S} and {Y(s),s G S}. The procedure is based on the observed 
pairs of functions (X(sk), Y (&%)), 1 < k < N. The test we propose is applied 
to ionosonde (X) and magnetic (Y) curves, and conclusively shows that the 
temporal evolution of these two families is strongly correlated. 

The remainder of the paper is organized as follows. Sections 2 and 3 
focus, respectively, on the estimation of the mean function and the FPCs in 
a spatial setting. Section 4 demonstrates by means of a simulation study that 
the methods we propose improve on the standard approach, and discusses 
their relative performance and computational cost. In Section 5 we develop 
a test for the correlation of two functional spatial fields. This test requires 
estimation of a covariance tensor. After addressing this issue in Section 6, 
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we study in Section 7 the finite sample properties of several implementations 
of the test. Finally, in Section 8 we apply the methodology developed in the 
previous section to test for the correlation between the ionospheric critical 
frequency and magnetic curves. 

2. Estimation of the mean function. We propose three methods of esti- 
mating the mean function fj,, which we call Ml, M2, M3. As will become 
apparent in this section, several further variants, not discussed here, are 
conceivable. But the results of Section 4 show that while all these methods 
offer an improvement over the simple sample mean, their performance is 
comparable. We represent the observed functions as 

(2.1) X(B k ;t) = n(t)+£(s k ;t), k = 1,2, . . . ,N, 

where e is an unobservable field with Ee(s;t) = 0. All methods assume that 
the function valued field e(-) is strictly stationary and isotropic, even though 
weaker, more technical assumptions could be made for the specific methods. 
Methods Ml and M2 are akin to the kriging technique advocated by Giraldo, 
Delicado and Mateu (2011a) in that they treat the curves as single entities 
and seek to minimize the integrated mean squared error. Method M3 is 
similar in spirit to the approach to functional kriging developed by Nerini, 
Monestiez and Mante (2010) and Giraldo, Delicado and Mateu (2011b) who 
use cokriging of basis coefficients. 

Methods Ml and M2 estimate /x by the weighted average 

N 

(2.2) fi N = ^2w n X(s n ). 

n=l 

The optimal weights w k are defined to minimize E\\ J2n=i w nX{s n ) — /i|| 2 
subject to the condition Yln=l w n = ^ (IMP = J x 2 (t)dt). Using the method 
of the Lagrange multiplier, we see that the unknowns wi, W2, ■ ■ ■ , wn, r are 
solutions to the system of N + 1 equations 

N N 

(2.3) y^w n = 1, y^ j w k C kn -r = 0, n = 1,2, . . . ,N, 

n=l k=l 

where 

(2.4) C M = E[{s(s k ),s( Se ))}. 

Set w = (w\, . . . , wn) t . An easy way to solve the equations in (2.3) is to 
compute v = C" 1 !, where C = [C k g, 1 < k,£ < N], and then set w = ov, 
where a is a constant such that l T w = 1 . 

Method Ml. At each time point tj, we fit a parametric spatial model 
to the scalar field X(s;tj). To focus attention, we provide formulas for the 
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exponential model 

(2.5) Cov(X(s fc ;t i ),X(s,;i,)) = a 2 (t J )exp|-^^|. 

It is clear how they can be modified for other popular models. 
Observe that under model (2.5), 



C M = E J (X(a k ]t)-n(t))(X(a t ;t)-fi(t))dt 
Cov(X(s k ;tj),X(se;tj))dt 



[ 2 U s J d(s k ,s e )\ 
ja^e^—^-jdt. 



One way to estimate C k t is to set 
(2.6) C U = /a 2 (t)exp(-%#U 



with the estimates <7 2 (ij) and p(U) obtained using some version of empirical 
variogram, (4.7) or (4.8) in this study. 

Another way to proceed is to replace the p(tj) by their average p = m^ 1 x 
YyjLiP(tj)i where m is the count of the tj at which the variogram is esti- 
mated successfully. Then, the C k £ are approximated by 



^=(/. 2 (t )( it)exp{-^|M}. 



As explained above, in order to compute the weights Wj in (2.3), it is 
enough to know the matrix C only up to a multiplicative constant. Thus, 
we may set 

(2.7) C M = exp<^ 

I P 

Once the matrix C has been estimated, we compute the weights Wj, and 
estimate the mean via (2.2). 

If (2.6) is used, we refer to this method as Mia; if (2.7) is used, we call it 
Mlb. 

Method Ml relies on the estimation of the variograms at every point tj. 
Method M2, described below, requires only one optimization, so it is much 
faster than Ml. 

Method M2. We define the functional variogram 

2 1 (s k ,s e ) = E\\X(s k )-X(s e )\\ 2 

(2.8) = 2E\\X(s k ) - p\\ 2 - 2E[(X(s k ) - p,X(s e ) - p)] 
= 2E\\X(s)-p\\ 2 -2C ke . 
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The variogram (2.8) can be estimated by its empirical counterparts, like (4.7) 
or (4.8), with the \X(s k ) — X(se)\ replaced by 

\\X(s k )-X(s e )\\ = y(X(s k ;t)-X( S f,t)) 2 dt\ ' . 
Next, we fit a parametric model, for example, we postulate that 
(2.9) 7 (s fc ,s,) = <rj(l-exp/ d{Sk ' Se) 



Cov((Bj,X(sk)}, (Bj,X(si))) = a? exp 



Pf 

The subscript / is used to emphasize the functional variogram. Denoting 
by pf the resulting NLS estimate, we estimate the C k i by (2.7) with p 
replaced by pj. 

Method M3. This method uses a basis expansion of the functional data, 
it does not use the weighted sum (2.2). Suppose Bj, 1 < j < K, are elements 
of a functional basis with K so large that for each k 

(2.10) X(s k )f* ^ X(s fe )>£,- 

i<K 

to a good approximation. By (2.1), we obtain for every j 

(2.11) {B j ,X(s k )) = (B j ,p,) + (B j ,e{s k )), k = l,2,...,N. 

For every fixed j, the (Bj,X(s k )) form a stationary and isotropic scalar spa- 
tial field with a constant unknown mean {Bj , p) . This mean can be estimated 
by postulating a covariance structure for each (Bj,X(s k )) , for example, 

d(sk,S j) 
Pi 

The mean (Bj,p) is estimated by a weighted average of the (Bj,X(s k )) 
(the weights depend on j). Denote the resulting estimate by p,j. The mean 
function p is then estimated by p(t) = X^<k PjBj(t). 

3. Estimation of the principal components. Assume now that the mean 
function p has been estimated, and this estimate is subtracted from the data. 
To simplify the formulas, in the following we thus assume that EX(s) = 0. 

We consider analogs of methods M2 and M3. Extending Method Ml 
is possible, but presents a computational challenge because a parametric 
spatial model would need to be estimated for every pair (ti,tj). For the 
ionosonde data studied in Section 8, there are 336 points tj. Estimation on 
a single data set would be feasible, but not a simulation study based on 
thousands of replications. 

In both approaches, which we term CM2 and CM3, the FPCs are esti- 
mated by expansions of the form 

K 

(3.1) v J (t) = Y,^ ) B a (t), 

a=l 
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where the B a are elements of an orthonormal basis. We first describe an 
analog of method M3, which is conceptually and computationally simpler. 
Method CM3. The starting point is the expansion 

oo 

X(s;t) = 5>(s)£i(t), 

3=1 

where, by the orthonormality of the Bj, the £j(s) form an observable field 
£j( s k) = (Bj,X(sk)) . Using the orthonormality of the Bj again, we obtain 



C{Bj) = E 



i=i 



(3.2) 



E 



lj2Us)B a ,B j )J2Us)B l 

. \a=l 

oo 



i=l 

Thus, to estimate C, we must estimate the means i£[£j(s)£j(s)]. 

Fix i and j, and define the scalar field z by z(s) = £j(s)£j(s). We can 
postulate a parametric model for the covariance structure of the field z(-), 
and use an empirical variogram to estimate fi z = Ez(s) as a weighted average 
of the z(sf~). Denote the resulting estimate by r^. 

The empirical version of (3.2) is then 

K 

(3.3) C^H^r^. 

i=i 

Relation (3.3) defines the estimator C which acts on the span of Bj, 1 < j < K . 
Its eigenfunctions are of the form x = Yli<a<K x aB a . Observe that 

C(x) = ^ X a ^ ^iaSi = ^ ( ^2 ^ iaXa ) Bi - 



On the other hand, 



Xx = ^2 Aajj-Bj. 



Since the Bi form an orthonormal basis, we obtain 

a 

Setting 



[xi,x 2 ,...,x K \ 



R=[hjA<i,j<K], 



10 



GROMENKO, KOKOSZKA, ZHU AND SOJKA 



we can write the above as a matrix equation 

(3.4) Rx = Ax. 
Denote the solutions to (3.4) by 

(3.5) ^ = [x ( f\x^\...,x^ ) ] T , A is l<j<K. 
The x( J ) satisfy Yla=i&& = Therefore, the dj defined by 

K 

(3.6) v j = Y,^ ) B a 

a=l 

are also orthonormal (because the Bj are orthonormal) . The Vj given by (3.6) 
are the estimators of the FPCs, and the A,- in (3.5) of the corresponding 
eigenvalues. 

As in method M3, the value of K can be taken to the number of basis 
functions used to create the functional objects in R, so it can be a relatively 
large number, for example, K = 49. Even though the range of j in (3.5) 
and (3.6) runs up to K, only the first few estimated FPCs Vj would be used 
in further work. 

Method CM2. Recall that under the assumption of zero mean function, 
the covariance operator is defined by C{x) = E[{X(s), x)X(s)]. It can be 
estimated by the simple average 

TV N 

(3-7) -^(X(s n ),-)X(s n ) = -^C fc , 

n=l n=l 

where C k is the operator defined by 

C k (x) = (X(s k ),x)X(s k ). 

As for the mean, more precise estimates can be obtained by using the 
weighted average 

N 

(3.8) C = Y,WkC k . 

k=l 

Before discussing the estimation of the weights w k , we comment that 
the FPCs Vj and their eigenvalues Xj can be estimated using (3.8) and the 
representation (3.1). As in method CM3, set x = Xa<«<x x a B a , and observe 
that 



r ( K \ 

C (x) =/A / I S ja X a Bj, 
j=l \a=l J 



where 

N 

Sja = y~]w k {Xk,Bj){Xk,B a ) 



k=l 
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Thus, analogously to (3.4), we obtain a matrix equation Sx = Ax, from 
which the estimates of the Vj,Xj can be found as in (3.5) and (3.6). 

We now return to the estimation of the weights Wf- in (3.8). One way to 
define the optimal weights is to require that they minimize the expected 
Hilbert-Schmidt norm of C — C. Recall that the Hilbert-Schmidt norm of 
an operator K is defined by 

oo oo „ 

ll^ll5 = Ell J ^)H 2 = E / \K{em? dt, 

i=l i=l J 

where > 1} is any orthonormal basis in I? . Since || • ||s is a norm in the 
the Hilbert space S of the Hilbert-Schmidt operators with the inner product 

oo 

{K l ,K 2 ) s = Y J (K l (e t ),K 2 {e i )), 

i=l 

we can repeat all algebraic manipulations needed to obtain the weight Wi in 
(2.2). The optimal weights in (3.8) thus satisfy 

N N 

(3.9) ^]w n = l, ^WkKkn- r = 0, n = 1,2, . . . ,N, 

n=l k=l 

where 

K M = E[(C k -C,C e -C)s]. 

Finding the weights thus reduces to estimating the expected inner prod- 
ucts Kk£- 

Since method M2 of Section 2 relies only on estimating inner product 
in the Hilbert space L 2 , it can be extended to the Hilbert space S. First 
observe that, analogously to (2.8), 

E\\C k - Ce\\ 2 s = 2£||C fe - C\\l - 2k u . 

We can estimate the variogram 

lc {d) = E\\ (X(s), -)X(s) - (X(s + d), -)X(b + d)|||, d = ||d|| 

by fitting a parametric model. In formulas (4.7) and (4.8), the squared dis- 
tances (X(sf c ) — X(si)) 2 must be replaced by the squared norms \\Ck — Ce\\g. 
These norms are equal to 

oo „ 

\\Ck - c t \\l = E / (fikX k (t) - f ie x e (t)f dt, 

where 

fik = J X k (t) ei (t)dt. 
The inner products can be computed using the R package f da. 
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4. Finite sample performance of the estimators. In this section we report 
the results of a simulation study designed to compare the performance of 
the methods proposed in Sections 2 and 3 in a realistic setting motivated 
by the ionosonde data. It is difficult to design an exhaustive simulation 
study due to the number of possible combinations of the point distributions, 
dependence structures, shapes of mean functions and the FPCs and ways of 
implementing the methods (choice of spatial models, variogram estimation 
etc.). We do, however, think that our study provides useful information and 
guidance for practical application of the proposed methodology. 

Data generating processes. We generate functional data at location s k as 

p 

(4.1) X(s k ;t)=^t) + ^2^ l (s k )e l (t), 

i=i 

where the e, are orthonormal functions; cf. model (1.1). 
To evaluate the estimators of the mean, we use p = 2 and 

(4.2) ei(t) =V2sin(27rf-6) > e 2 (t) = \/2sin(27rt/2). 
We use two mean functions 

(4.3) /j,(t) = aV2sm(2irt-3), a = 2, 
and 

(4.4) fi(t) = aVtsm(2irt ■ 3), a = l. 

The mean function (4.3) resembles the mean shape for the ionosonde data. 
It is, however, a member of the Fourier basis, and can be isolated using only 
one basis function, what could possibly artificially enhance the performance 
of method M3. We therefore also consider the mean function (4.4). Combin- 
ing the mean function (4.3) and the FPCs (4.2), we obtain functions which 
very closely resemble the shapes of the ionosonde curves. In the above for- 
mulas, time is rescaled so that t £ [0, 1]. 

To evaluate the estimators of the FPCs, we use p = 3 and 

(4.5) X(s k -t) = ei(s fc ) ei(t) + e2(t) + e 2 (s fe )e 3 (t), 

where ei(i) = V2sm(2irt • 7), e 2 (i) = v^sin^Trt • 2), e 3 (t) = \/2sin(37rt • 3). 
Direct verification, which uses the independence of the fields £i and £2, shows 
that the FPCs are v\ = 2~ l / 2 (e\ + e 2 ) and v 2 = 2 3 (for the parameters of 
the & specified below). 

To complete the description of the data generating processes, we must 
specify the dependence structure of the scalar spatial fields £1 and £2- A com- 
mon assumption for the Karhunen-Loeve expansions used in statistical in- 
ference is that the score processes £j are independent, and this is what we 
assume. We use the exponential and Gaussian models (8.4) with chordal 
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distances (1.2) between the locations described below. To make simulated 
data look similar to the real foF2 data, we chose o\ = 1, p\ = tt/6 for £i(s) 
field and o"2 = 0.1, p2 = 7r/4 for ^(s) field. 

The locations s*. are selected to match the locations of the real ionosonde 
stations. For the sample size 218 we use all available locations, as shown 
in Figure 2. Size 32 corresponds to the ionosondes with the longest record 
history. We also consider a sample of size 100; the 100 stations were selected 
randomly out of the 218 stations. 

Details of implementation. All methods require the specification of a para- 
metric spatial model for the variogram. Even though for some methods the 
variograms are defined for L 2 - or 5-valued objects, only a scalar model is re- 
quired. In this simulation study we use the exponential and Gaussian models. 

Methods M3, CM2 and CM3 require the specification of a basis {Bj} 
and the number K of the basis functions. We use the Fourier basis and 
K = 1 + 4[y where is the count of the points at which the 

curves are observed. For our real and simulated data K = 1 + 4[\/336] = 73, 
a number that falls between the recommended values of 49 and 99 for the 
number of basis functions. Specifically, the basis functions Bj are 

(4.6) {1, v^sin^Trf • i), V2 cos(2vrt • = 1, 2, . . . , 36}. 

All methods require the estimation of a parametric model on an empirical 
variogram. There are several versions of the empirical variogram for scalar 
fields. The classical estimator proposed by Matheron is given by 

(4.7) m = — J— (*( s *o - *( s 0) 2 > 

where N(d) = {(sj, Sj) : d Si)Sj = d; i,j = 1, . . . , N} and 1^(^)1 is the number of 
distinct pairs in N(d). A robust estimator proposed by Cressie and Hawkins 
is defined as 

<"> ^ - (im^ XM - ^>' I/2 )7 (°- 457 + wm)- 

For details, we refer to Section 4.4 of Schabenberger and Gotway (2005), 
where other ways of variogram estimation are also discussed. In our study 
we use only estimators (4.7) and (4.8), and refer to them, respectively, as 
MT and CH. 

Results of the simulation study. For comparison of different methods we 
introduce the quantity L which is the average of the integrated absolute 
differences between real and estimated mean functions or FPCs. For the 
mean function, L is defined by 

R 

(4.9) L = ^E / IM*)-M*)l<ft. 
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N=32 N=100 N=218 




Fig. 4. Errors in the estimation of the mean function for sample sizes: 32, 100, 218. The 
dashed boxes are estimates using the CH variogram, empty are for the MT variogram. The 
rightmost box for each N corresponds to the simple average. The bold line inside each box 
plot represents the average value of L (4-9). The upper and lower sides of rectangles show 
one standard deviation, and horizontal lines show two standard deviations. The rightmost 
boxes correspond to the standard method. 

where R is the number of replications; we use R = W 3 . For the FPCs the 
definition is fully analogous. We also compute the standard deviation for L, 
based on the normal approximation for R independent runs. 

The results of the simulations for the mean function (4.4) are shown in 
Figure 4. The data generating processes have exponential covariance func- 
tions. If the £j in (4.1) have Gaussian covariances, the results are not visu- 
ally distinguishable. The errors values for mean (4.3) are slightly different, 
but the relative position of the box plots practically does not change. All 
methods Ml, M2 and M3 are significantly better than the sample average. 
Method M2 strikes the best balance between the computational cost and 
the precision of estimation. Note that methods Ml and M2 were designed 
to minimizes the expected L 2 distance, and all three methods are compared 
using the L 1 distance, so this comparison does not a priori favor them. In 
the context of forecasting, using different loss functions to evaluate the fore- 
casts than to design them can lead to spurious conclusions; see Gneiting 
(2011). In our context, if the L 2 distance is used to compare the methods, 
the ranking and conclusions are the same. 

Errors in the estimation of the FPCs in model (4.5) are shown in Figure 5. 
The displayed errors are those for the £j with exponential covariances and the 
CH variogram. The results for Gaussian covariances and the MT variogram 
are practically the same. The performance of methods CM2 and CM3 is 
comparable, and they are both much better than using the eigenfunctions 
of the empirical covariance operator (3.7), which is the standard method 
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Fig. 5. Errors in the estimation of the FPCs for sample sizes: 32, 100, 218. The bold 
line inside each box plot represents the average value of L. The upper and lower sides of 
rectangles show one standard deviation, and horizontal lines show two standard deviations. 
The rightmost boxes correspond to the standard method. 

implemented in the fda package. The computational complexity of methods 
CM2 and CM3 is the same. 

Conclusions. For simulated data generated to resemble the ionosonde 
data, all methods introduced in Sections 2 and 3 have integrated absolute de- 
viations (away from a true curve) statistically significantly smaller than the 
standard methods designed for i.i.d. curves. Methods M2 and CM2, based 
on weighted averages estimated using functional variograms, offer a com- 
putationally efficient and unified approach to the estimation of the mean 
function and of the FPCs in this spatial setting. 

5. Testing for correlation of two spatial fields. Motivated by the problem 
of testing for correlation between foF2 and magnetic curves, described in 
detail in Section 8, we now propose a relevant statistical significance test. 

There are N spatial locations: si,S2, . . . ,sjv- At location s k , we have two 
curves: 

X k =X(s k )=X(s k ;t), i€[0,l], 

and 

Y k = Y(s k ) = Y(s k ;t), te[0,l]- 

We want to test the null hypothesis that the collections of curves {X k , 1 < 
k < N} and {Y k , 1 < k < N} are uncorrelated in a sense defined below. The 
null distribution is derived assuming a stronger condition that these two 
families are independent. Szekely, Rizzo and Bakirov (2007) and Szekely and 
Rizzo (2009) introduced measures of dependence based on the distance of 
characteristic functions which allow to test independence (rather than just 
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lack of correlation) of random variables X and Y given a sample of i.i.d. 
observations (XkXk)- The extension of their theory to the case of spatially 
dependent observations (XkXk) is not obvious, so we consider only a test 
for linear dependence. 

The idea of the test is as follows. To lighten the notation, assume that 

EX k (t) = and EY n {t) = 0. 

The mean functions will be estimated and subtracted using one of the meth- 
ods of Section 2. We approximate the curves X n and Y n by the expansions 

v i 

,Uj)Uj(t), 

i=l j=l 

where the Vi and the Uj are the corresponding FPCs. At this point, the func- 
tions Vi, 1 < i < p and Uj, 1 < j < q are deterministic, so the independence of 
the curves X n of the curves Y n implies the independence of the vectors 

[(X n , Vl ), (X n ,v 2 ), . ..,(X n ,v p )f, l<n<N 

and 

[(Y n , Ul ), (Y n , ua), . . . , 0r n , u q )] T , 1 < n < N. 
Then, under Hq, the expected value of the sample covariances 

1 N 

(5.1) A N (i,j) = — ^2{X n ,v i )(Y n ,u j ) 

n=l 

is zero. If their estimated versions are large as a group, that is, if some of 
the estimated A^(i,j) are too large, we reject the null hypothesis. 
To construct a test statistic, we introduce the quantities 

V M (i,i')=E[(v i ,X k ){v i ,,X £ )}, U ke (j,j') = E[(u j ,Yk)(u' j ,m- 

Note that Vki(i, i') = and Uki(j,j') = 0, if the observations in each sam- 
ple are independent (and have mean zero). Thus, the Vke(i,i') and the 
Uki(j,j') are specific to dependent data, they do not occur in the cur- 
rently available testing procedures developed for independent curves. Setting 
Xik = {vi,Xk},Yjk = (ujXk), observe that if the Xn- are uncorrelated with 
the Yjk, then 

' N N 

/] XikYjk XiigYj'l 
k=\ i=i 

. N N 

= E[XikXi>£\EXjiXj'i] 
k=i i=\ 

N N 
k=l 1=1 



E [VnAn \ZnAn (i',f)) = ^E 
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The normalized covariance tensor of the y/N A^(i,j) thus has entries 

1 N 

(5.2) a N (i,j;i',j') = - £ ^(mO^O"./)- 

k,e=i 

The idea of the test is to approximate the distribution of the matrix 
A N = [A N (i,j), l<i<p,l<j<q] 

via x/~N Ajv ~ Z, where Z is a j) x g Gaussian matrix whose elements have 
covariances E[Z(iJ)Z(i',j')] = a N (i,j;i' ,f). 

We now explain how to implement this idea. Denote by X%,jj and Vi,Uj 
the eigenvalues and the eigenfunctions estimated either by method CM2 or 
CM3. The covariances Aj\[(i,j) are then estimated by 

1 N 

An{%3) = Jj/^2{ X n,Vi){Yn,Uj). 
n=l 

If the observations within each sample are independent, an appropriate 
test statistic is 

i=l j=l 

Since Aj = E[(vi,X) 2 ], this is essentially the sum of all correlations, and it 
tends to a chi-squared distribution with pq degrees of freedom, as shown in 
Kokoszka et al. (2008). This is, however, not that case for dependent data. 
To explain, set 

a7 v = vec(ATv), 

that is, ajv is a column vector of length pq consisting of the columns of A^r 
stacked on top of each other, starting with the first column. Then y/N&N is 
approximated by a Gaussian vector z with covariance matrix XI constructed 
from the entries (5.2). It follows that 

(5.3) S N = Nk N i: ajy ~ xjLp 
where a^r = vec(ATv). The entries of the matrix S are 

1 - 

(5.4) &N(i,j;i',j') = ^ E Vke(i,i')Uke(j,j'), 

k,£=l 

where Vki(i,i') and Uki(j,j') are estimators of Vki(i,i') and Uke(j,j'), re- 
spectively. The test rejects Hq if Sn > x^ q (l — a), where Xp g (^ ~ a ) IS the 
100(1 — a)th percentile of the chi-squared distribution with pq degrees of 
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freedom. One can use Monte Carlo versions of the above test, for example, 
the test is based on the approximation 

(5.5) fjv := iVa^-ajv ~ w T £w, 

where the components of w are i.i.d. standard normal. 
The test procedure can be summarized as follows: 

(1) Subtract the mean functions, estimated by one of the methods of 
Section 2, from both samples. 

(2) Estimate the FPCs by method CM2 or CM3. 

(3) Using a model for the covariance tensor (5.4) (see Section 6), compute 
the test statistic Sjy. (This tensor is not needed to compute T^r, but it is 
needed to find its Monte Carlo distribution.) 

(4) Find the P-value using either a Monte Carlo distribution or the \ 2 
approximation . 

We now turn to the important issue of modeling and estimation of the 
matrix 5]. 

6. Modeling and estimation of the covariance tensor. The estimation of 
the Vke(i,i') involves only the X n , and the estimation of the Uki{j,j') only 
the Y n , so we describe only the procedure for the Vki(i,i'). We assume that 
the mean has been estimated and subtracted, so that we can define 

(6.1) C h (x) = E[(X(s),x)X(s + h)}, h=\\h\\. 
The estimation of the Vke(i,i f ) relies on the identity 

Vke(i,i') = {C h (vi),Vi>), h = d(s k ,si). 

To propose a practical approach to the estimation of XI, we consider an 
extension of the multivariate intrinsic model; see, for example, Chapter 22 
of Wackernagel (2003). A most direct extension is to assume that 

(6.2) C h = Cr(h), 

where C is a covariance operator, that is, a symmetric positive definite 
operator with summable eigenvalues, and r{h) is a correlation function of a 
scalar random field. Since r(0) = 1, we have C = Co, so C in (6.2) must be 
the covariance operator of each X(s). If we assume the intrinsic model (6.2), 
then 

(6.3) Vki(i,j) = (r(h)C(vi),Vj) = AAjK d ( s ifc> s 0)- 
To allow more modeling flexibility, we postulate that 



(6.4) 



Vke(i,j) = XiSijri(d(s k ,Si)). 
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Under (6.3) [equivalently, under (6.2)], each scalar field (X(s),Vi) has the 
same correlation function, only their variances are different. Under (6.4), 
the fields (X(s),Vi) can have different correlation functions. As will be seen 
below, model (6.4) also leads to a valid covariance matrix. 

The correlations ri(d(sfc, Sj)) and the variances Aj can be estimated using 
a parametric model for the scalar field £i(s) = (X(s),Vi). The resulting esti- 
mates rj(d(sfc,sj)) and Aj lead to the estimates Vke(i,j) via (6.4). Analogous 
estimates of the functional field Y are Jj(d(8k,s{)),fj and U^(i,j). 

For ease of reference, we note that under model (6.4) and Hq, the covari- 
ance tensor, 



N N 



N 

k=l 1=1 



has the following matrix representation: 

/ N N N N N 

(6.5) 53 = diag ]T ]T 5^ ^ (M),-,EE^ (*> £)± Vq (k, I) 



yk = l 1=1 k = l 1=1 



%.(/c,f) = -=Xifi(d(s k ,8£)) 
v -/V 



where 



and 



This form is used to construct the Monte Carlo tests discussed in Section 7. 

The matrices 51 and 51 are positive definite; see Horvath and Kokoszka 
(2012) for the verification. 

7. Size and power of the correlation test. As in Section 4, our objective is 
to evaluate the finite sample performance of the test introduced in Section 5 
in a realistic setting geared toward the application presented in Section 8. 

Data generating processes. We generate samples of zero mean Gaussian 
processes 

v i 
(7.1) X(s;i)=^&(sK(i); Y(s;t) = J>,(sK(i). 

i=l j=l 

The process X is designed to resemble in distribution appropriately trans- 
formed and centered foF2 curves; the process Y the centered magnetic 
curves. Following the derivation presented in Section 8, we use p = 7 and 
q = l. The curves Vi and u\ are the estimated FPCs of the real data. The 
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Table 1 

Models and estimated covariance parameters for the transformed foF2 curves and the 

magnetic curves 



Spatial 
field 


Model 




Parameters 




Co 




P 


7] 


Gaussian 




5.99 ± 0.48 


0.32 ±0.04 


ii 


Gaussian 




20.05 ± 2.20 


0.12 ±0.03 








3.30 ±0.43 






Exponential 




2.63 ±0.52 


0.16 ±0.07 




Gaussian 




2.66 ±0.39 


0.18 ±0.05 








2.74 ±0.32 






Gaussian 


0.16 ±0.02 


0.85 ±0.24 


0.17 ±0.06 








1.22 ±0.18 





scalar Gaussian spatial fields £j and 771 follow parametric models estimated 
for real data; details of the models are presented in Table 1 . The £j are inde- 
pendent. Under Hq, the & are independent of 771. The dependence under Ha 
can be generated in many ways. We considered the following scenarios: £1 
and 771 are dependent, £j and 771 are independent for i 7^ 1, then £2 and 771 
are dependent, £j and 771 are independent for i ^ 2, etc. To produce two 
dependent spatial fields £j and 77, we generated iV i.i.d. pairs Xj = [xu, X2i] T , 
1 < i < N, where 

—Hi ?))■ 

Then we merged all xu into vector yi = [xn, . . . , xin] t and all into 
vector y2 = [X21, ■ ■ ■ , X2n] T • Performing the Cholesky rotation, we obtain 
correlated spatial vectors: 

6 = Vyi, (S ei =VV T ), r? = Uy 2 , (S, = UU T ). 

We used sample sizes N = 32 and A r = 100 corresponding to the locations 
determined as in Section 4. 

Testing procedures. We studied the finite sample behavior of three meth- 
ods, which we call S, SM and T. Method S rejects Hq if the statistic Sjy (5.3) 
exceeds a chi-square critical value. Method SM uses a Monte Carlo distribu- 
tion of the statistic SV: after estimating all parameters from the data and 
assuming the Gaussian distribution of the fields £j and 771 , we can replicate 
the values of the statistic Sn under Hq using the covariance matrix (6.5). 
Method T uses the statistics T/v (5.5), and approximates its distribution by 
the Monte Carlo distribution of w T 5]w, as explained in Section 5. For deter- 
mining the critical values in methods SM and T, we used 10 7 Monte Carlo 
replications. The empirical size and power are based on 10 5 independent 
runs. 



SPATIALLY INDEXED CURVES AND IONOSPHERIC TRENDS 



21 




Fig. 6. Size of the correlation test as a function of p. Solid disks represent method S 
(based on \ 2 distribution). Circles represent method SM (based on the Monte Carlo dis- 
tribution). 



Conclusions. As Figure 6 shows, the empirical size is higher than the nom- 
inal size, and it tends to increase with the number p of principal components 
used to construct the test, especially for N = 32. The usual recommenda- 
tion is to use p, which explains about 85% of the variance. For the foF2 data 
with N = 32, this corresponds to p = 4. Applied to real data in Section 8, 
all tests (S, SM and T) lead to extremely strong rejections, so the inflated 
empirical size is not a problem. Figure 6 also shows that the Monte Carlo 
approximation is useful for N = 32, this is the sample size we must use in 
Section 8. The size of test T is practically indistinguishable from that of test 
SM. Figure 7 shows the power of method SM; power curves for method T 
are practically the same, method S has higher power. The simulation study 
shows that a strong rejection when the test is applied to real data can be 
viewed as reliable evidence of dependence. 

8. Application to critical ionospheric frequency and magnetic curves. In 

this section we apply the correlation test, which uses the estimation method- 
ology of Sections 2 and 3, to foF2 and magnetic curves. 
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0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 

P P 

Fig. 7. Power of the correlation test SM as a function of the population correlation p. 
Each line represents one of the four possible correlated spatial fields: £ x — r], £ 2 ~Wi £3 ~ r 1: 
£ 4 — r\. The test was performed using p — 4, which explains about 85% of variance of the 
foF2 curves. Since all curves in the graphs are practically the same, we do not specify 
which curve represents a particular dependent pair £ 4 — r\. 

Description of the data. The F2 layer of the ionosphere is the upper part 
of the F layer shown in Figure 3. The F2 layer electron critical frequency, 
foF2, is measured using an instrument called the ionosonde, a type of radar. 
The foF2 frequency is used to estimate the location of the peak electron 
density, so an foF2 trend corresponds to a trend in the average height of 
the ionosphere over a spatial location. The foF2 data have therefore been 
used to test the hypothesis of ionospheric global cooling discussed in the 
Introduction. Hourly values of foF2 are available from the SPIDR database 
http://spidr.ngdc.noaa.gov/spidr/ for more than 200 ionosondes. We 
use monthly averages for 32 selected ionosondes, with sufficiently complete 
records, for the period 1964-1992. Their locations are shown in Figure 2. 
Three typical foF2 curves are shown in Figure 1. We omit the details of 
the procedure for obtaining curves like those shown in Figure 1, but we 
emphasize that it requires a great deal of work. In particular, the SPIDR data 
suffer from two problems. First, for some data, the amplitude is artificially 
magnified ten times, and needs to be converted into standard units (MHz). 
Second, in many cases, missing observations are not replaced by the standard 
notation 9999, but rather just skipped. Thus, if one wants to use equally- 
spaced time series, skipped data must be found and replaced by missing 
values. For filling in missing values, we perform linear interpolation. We 
developed a customized C++ code to handle these issues. We emphasize 
that one of the reasons why this global data set has not been analyzed so 
far is that useable data have been derived only over relatively small regions, 
for example, Western Europe, and more often for a single location. 

As explained in Section 1, the foF2 data are used to test hypotheses on 
long-term ionospheric trends. We thus removed annual and higher frequency 
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variations using 16 month averaging with MODWT filter; see Chapter 5 of 
Percival and Walden (2000). This leads to 32 time series at different loca- 
tions, each containing 336 equally-spaced temporal observations. The am- 
plitude of the foF2 curves exhibits a nonlinear latitudinal trend; it decreases 
as the latitude increases; see Figure 1. To remove this trend, which may po- 
tentially bias the test, we assume that the foF2 signal, F(s;t), at location s 
follows the model 

(8.1) F(s;t) = G(L( S ))X( S ;t), 

where X(s;t) is a constant amplitude field, and G(-) is a scaling function 
which depends only on the magnetic latitude L (in radians). Since the trend 
in the amplitude of F(s; t) is caused by the solar radiation which is nonlin- 
early proportional to the zenith angle, we postulate that the function G(-) 
has the form 

(8.2) G(L) = a + 6cos c (L). 

The parameters a, b, c are estimated as follows. Let so be the position of 
the ionosonde closest to the magnetic equator. For identifiability, we set 
G(L(so)) = 1. For the remaining locations s^, we compute G?(L(sfc)) as the 
average, over all 336 time points tj of the ratio F(sk;tj)/F(so;tj). Figure 8 
shows these ratios as a function of the magnetic and geographic latitude. The 
ratios in the magnetic latitude show much less spread, and this is another 
reason why we work with the magnetic latitude. The curve G(L) (8.2) is 
fitted to the 67(L(sfc)) in magnetic latitude by nonlinear least squares. The 
fitted values are a = 0.5495, b = 0.4488, c = 4.2631. 
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Fig. 8. Dots represent the scaling function GL(si) in the magnetic coordinate system 
and crosses are the same in the geographic coordinate system. Line is the best fit for Gl 
in the magnetic coordinate system. 
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We now describe how we construct the curves that reflect the relevant 
long-term changes in the internal magnetic field of the earth. The height of 
the F2 layer (and so the foF2 frequency) can be affected by a vertical plasma 
drift which responds to the magnetic field. The vertical plasma drift is due 
to the wind effect, and is given by [we use the same notation as in Mikhailov 
and Marin (2001)] 

W = (V nx cos D — V ny sin D) sin / cos / + V nz sin 2 1. 

In the above formula, V nx , V ny and V nz are, respectively, meridional (parallel 
to constant longitude lines), zonal (parallel to constant latitude lines) and 
vertical components of the thermospheric neutral wind; / and D are incli- 
nation and declination of the earth magnetic field. Detailed figures are pro- 
vided in Chapter 13 of Kivelson and Russell (1997). Usually V nz <C V nx , V ny , 
and assuming that the difference between magnetic and geographic coor- 
dinates, D, is small (at least for low- and mid-latitude regions), we can 
simplify the above formula to W = sin /cos/. Thus, only the meridional 
thermospheric wind is significant. Measuring neutral wind components (V nx , 
Vnyi V nz ) is difficult, and long-term wind records are not available. We there- 
fore replace V nx by its average. For the test of correlation, the specific value 
of this average plays no role, so we define the magnetic curves as 

(8.3) Y(s;t) =sin/(s;t)cos/(s;i). 

The curves I(s;t) are computed using the international geomagnetic refer- 
ence field (IGRF); the software is available at http : //www . ngdc . noaa . gov/ 
IAGA/vmod/. 

The test is applied to the curves X(sk;t) defined by (8.1) and (8.2), and 
to the curves Y(sk]t) defined by (8.3). 

Application of the correlation test. We first estimate and subtract the 
mean functions of the fields X(sk) and l^(sfc) using method M2 (the other 
spatial methods give practically the same estimates). The principal com- 
ponents Vi and Ui are estimated using method CM2 (method CM3 gives 
practically the same curves). 

We apply the test, for all 1 < p < 7 and q = l. The first seven eigenvalues 
of the field X [computed per (3.4) or its analog for method CM2] explain 
about 95% of the variance. The first eigenvalue of the field Y explains about 
99% of the variance. The eigenfunction u\ is approximately equal to the 
linear function: u\{t) ~ t. This means that at any location, after remov- 
ing the average, the magnetic field either linearly increases or decreases, 
with slopes depending on the location; see Figure 9. To lighten the nota- 
tion, we drop the "hats" from the estimated scores and denote the zero 
mean vector [a^(si), . . . , &(stv)] t by and [771 (si ),..., r/i(syv)] T by rj. The 
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Fig. 9. Transformed and centered foF2 curves (continuous) and centered magnetic curves 
(dashed) at 32 locations denoted with circles in Figure 2. The scales for the two families of 
curves are different. The foF2 curves have the same scale, it is shown on the right vertical 
axes in MHz. The scale of the magnetic curves changes, it is shown on the right vertical 
axes in each box (unitless). 
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Table 2 

P '-values of the correlation tests applied to the transformed foF2 data. The first column 
shows the number of FPGs, the second column shows cumulative variances computed as 
the ratios of the eigenvalues estimated using method CMS. Testing procedures S, SM and 
T are defined in Section 7. The "simple" procedure neglects the spatial dependence of the 

curves 



Spatial 



p 


CV, % 




S 


SM 


T 


Simple 


1 


47.88 


6.22 


10~ 5 


3.05 ■ 10" 4 


3.05 • 10" 4 


0.035 


2 


62.59 


3.26 


10~ 6 


2.91 ■ 10" 4 


2.99 • 10" 4 


0.095 


3 


73.67 


4.53 


10~ 8 


2.43 ■ 10~ 4 


2.32 • 10~ 4 


0.043 


4 


84.40 


1.47 


10 -38 


1.6- 10" 7 


2.24 • 10" 5 


0.039 


5 


88.70 


4.95 


1Q -26 


2.6 • 10 -7 


2.27 • 10" 5 


0.046 


6 


92.21 


6.73 


lO" 27 


5.9- 10" 7 


2.21 ■ 10" 5 


0.060 


7 


94.57 


2.12 


1Q -32 


1.6- 10~ 7 


1.92 ■ 10~ 5 


0.030 



covariances and are estimated using parametric spatial models de- 
termined by the inspection of the empirical variograms. In this application, 
it is sufficient to use two covariance models: 

Gaussian: c(s&, s^) = cq + a 2 exp{— d 2 (k, £)/p 2 }, 

(8.4) 

Exponential: c(sfc,s^) = cq + <j 2 exp{— d(k,£)/p}. 

When the scores do not have a spatial structure, we use the sample variance 
(flat variogram). The estimated models and their parameters are listed in 
Table 1. 

The P- values for different numbers of FPCs 1 < p < 7 are summarized in 
Table 2. Independent of p and a specific implementation of the test, all P- 
values are very small, and so the rejection of the null hypothesis is conclusive; 
we conclude that there is a statistically significant correlation between the 
foF2 curves X(sk) and the magnetic curves ^(s*.). We also applied a version 
of our test which neglects any spatial dependence, this is the test proposed 
by Kokoszka et al. (2008). The P- values hover around the 5% level, but still 
point toward rejection. The evidence is, however, much less clear cut. This 
may partially explain why this issue has been a matter of much debate in the 
space physics community. The correlation between the foF2 and magnetic 
curves is far from obvious. Figure 9 shows these pairs at all 32 locations. It 
is hard to conclude by eye that the direction of the magnetic field change 
impacts the foF2 curves. 

Discussion. A very important role in our analysis is played by the trans- 
formation (8.2). Applying the test to the original foF2 curves, F(sk;t), gives 
the P-values 0.209 (p = 1) and 0.011 (p = 2) for the spatial S test, and 0.707 
(p = 1), 0.185 (p = 2), 0.139 (p = 3) for the "simple" test. As explained above, 
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Fig. 10. Scatter plots of the scores — 1,2,3,4 of the foF2 curves, vertical axes, 
against the scores r) of the magnetic curves, horizontal axes. 



the amplitude of the field F(si-;t) evolves with the latitude. This invalidates 
the assumption of a mean function which is independent of the spatial loca- 
tion. Thus, even for the spatial test, the mean function confounds the first 
FPC. However, the spatial estimation of the mean function and of the FPCs 
"quickly corrects" for the violation of assumptions, and the null hypothesis 
is rejected for p > 2. When the spatial structure is neglected (and no latitu- 
dinal transformation is applied) no correlation between the foF2 curves and 
magnetic curves is found. 

The rejection of the null hypothesis means that after adjusting the foF2 
curves for the latitude and the global mean, their regional variability is corre- 
lated with the regional changes in the magnetic field. This conclusion agrees 
with recent space physics research [see Cnossen and Richmond (2008) and 
Lastovicka (2009)], and can, to some extent, be visually confirmed, post- 
analysis, by the examination of the scatter plots shown in Figure 10. It 
implies that long-term magnetic trends must be considered as additional 
covariates in testing for long-term trends in the foF2 curves. The main co- 
variate is the solar activity which drives the shape of the mean function, 
but, as explained in the Introduction, the impact of the concentration of 
the greenhouse gases is of particular interest; see Qian et al. (2009), among 
many other contributions. 

A broader conclusion of the work presented in this paper is that methods 
of functional data analysis must be applied with care to curves obtained at 
spatial locations. Neglecting the spatial dependence can lead to incorrect 
conclusions and biased estimates. The same applies to space physics re- 
search. If trends or models are estimated separately at each spatial location, 
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one should not rely on results obtained by some form of a simple averaging. 
This is, however, the prevailing approach. Interestingly, the results related 
to global ionospheric trends are often on the borderline of statistical signifi- 
cance. Standard i-tests lead either to rejection or acceptance, depending on 
a specific method used (a similar phenomenon is observed in the last column 
of Table 2). It is hoped that the methodology developed in this paper will 
be useful in addressing such issues. 
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